Old and new challenges in automatic plagiarism detection

نویسنده

  • Paul Clough
چکیده

Automatic methods of measuring similarity between program code and natural language text pairs have been used for many years to assist humans in detecting plagiarism. For example, over the past thirty years or so, a vast number of approaches have been proposed for detecting likely plagiarism between programs written by Computer Science students. However, more recently, approaches to identifying similarities between natural language texts have been addressed, but given the ambiguity and complexity of natural over program languages, this task is very difficult. Automatic detection is gaining further interest from both the academic and commercial worlds given the ease with which texts can now be found, copied and rewritten. Following the recent increase in the popularity of on-line services offering plagiarism detection services and the increased publicity surrounding cases of plagiarism in academia and industry, this paper explores the nature of the plagiarism problem, and in particular summarise the approaches used so far for its detection. I focus on plagiarism detection in natural language, and discuss a number of methods I have used to measure text reuse. I end by suggesting a number of recommendations for further work in the field of automatic plagiarism detection.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Plagiarism Meets Paraphrasing: Insights for the Next Generation in Automatic Plagiarism Detection

Although paraphrasing is the linguistic mechanism underlying many plagiarism cases, little attention has been paid to its analysis in the framework of automatic plagiarism detection. Therefore, state-of-the-art plagiarism detectors find it difficult to detect cases of paraphrase plagiarism. In this article, we analyze the relationship between paraphrasing and plagiarism, paying special attentio...

متن کامل

Corpus and Evaluation Measures for Automatic Plagiarism Detection

The simple access to texts on digital libraries and the WWW has led to an increased number of plagiarism cases in recent years, which renders manual plagiarism detection infeasible at large. Various methods for automatic plagiarism detection have been developed whose objective is to assist human experts to analyze documents for plagiarism. Unlike other tasks in natural language processing and i...

متن کامل

Running head: Automatic student plagiarism detection: future perspectives AUTOMATIC STUDENT PLAGIARISM DETECTION: FUTURE PERSPECTIVES

The availability and use of computers in teaching has seen an increase in the rate of plagiarism among students because of the wide availability of electronic texts online. While computer tools that have appeared in the recent years are capable of detecting simple forms of plagiarism, such as copy-paste, a number of recent research studies devoted to evaluation and comparison of plagiarism dete...

متن کامل

The Encoplot Similarity Measure for Automatic Detection of Plagiarism - Notebook for PAN at CLEF 2011

This paper describes the evolution of our method Encoplot for automatic plagiarism detection and the results of the participation to the PAN’11 competition. The main novelties are the introduction of a new similarity measure and of a new ranking method, which cooperate to rank much better the source– suspicious document pairs when selecting the candidates for the detailed analysis phase. We hav...

متن کامل

Who's the Thief? Automatic Detection of the Direction of Plagiarism

Determining the direction of plagiarism (who plagiarized whom in a given pair of documents) is one of the most interesting problems in the field of automatic plagiarism detection. We present here an approach using an extension of the method Encoplot, which won the 1st international competition on plagiarism detection in 2009. We have tested it on a large-scale corpus of artificial plagiarism, w...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003